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Dissolved gas analysis (DGA) is widely accepted as an effective method to 
detect incipient faults within power transformers. Gases such as hydrogen, 
methane, acetylene, ethylene and ethane are normally utilized to identify the 
transformer fault conditions. Several techniques have been developed to 


interpret DGA results such as the key gas method, Doernenburg, Rogers, 
International Electro Technical Commission (IEC) ratio-based methods, 
Duval triangles, and the latest Duval pentagon methods. However, each of 
Keywords: these approaches depends on the experts' shared knowledge and experience 
rather than quantitative scientific methods, therefore different diagnoses may 
: : be reported for the same oil sample. To overcome these shortcomings, this 
Dissolved gas analysis paper proposed the use of decision tree method to interpret the transformer 
Doernenburg ratio method health condition based on DGA results. The proposed decision tree model 
J48 employed three main fault gases; methane, acetylene, ethylene as inputs, and 
Transformer classified the transformer into eight fault conditions. The J48 algorithm is 
used to train and developed the decision tree model. The performance of the 
proposed model is validated with the pre-known condition of transformers 
and compared with the Duval triangle method (DTM). Results show that the 
proposed model delivers better precision and accuracy in predicting 
transformer fault conditions compared to DTM with 81% and 69% 
respectively. 
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1. INTRODUCTION 

Dissolved gas analysis (DGA) has been used extensively to assess the power transformer condition. 
The decomposition of paper and oil occurs due to high thermal and electrical stress on the transformer insulation 
system, producing gases that dissolve in the oil and decrease its dielectric strength [1]. The subsequent paper 
decomposition yields carbon monoxide (CO) and carbon dioxide (COz). Hydrogen (Hz), methane (CHa), 
acetylene (C2H2), ethylene (C2H4), and ethane (C2He6), on the other hand, are produced as a result of oil 
decomposition and the formation of faults [2], [3]. Every fault produces unique characteristic gasses that can be 
used to identify the faults and measure their severity [4], [5]. The low energy level, partial discharge, produces 
H2 and CHy gases while the high energy level, arcing, can produce all gases including C2H2. On the other hand, 
high thermal fault produces gases C2H4 and C2H¢ [1]. Thus, the nature of the fault can be determined based on 
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the type of gas generates. However, the analysis is not always straightforward because at the same time there is 
a possibility of more than one fault occurred [6]. 

On the basis of the DGA findings, various interpretation methods such as key gas method (KGM), 
Doernenburg ratio method (DRM), Rogers ratio method (RRM), International Electro Technical Commission 
(IEC) ratio method (IRM), Duval triangle method (DTM) [7], and Duval pentagon method (DPM) [8] have 
been established to determine the transformer's state. However, most of the aforementioned approaches use the 
DGA test statistics used by professionals to provide knowledge-based diagnostic recommendations, while other 
approaches are based on the theory of thermodynamics, which may not necessarily lead to the same conclusion 
for the same oil sample [9]. Several soft computing methods were suggested to remove these limitations in order 
to overcome those problems. 

To eradicate these shortcomings, artificial neural networks (ANN) [10]-[12], support vector machine 
(SVM) [13], [14] and fuzzy logic (FL) [15]-[17] were introduced. However, each of these methods also has 
some limitations. ANN is time-consuming, requires a large number of data samples to train the network 
properly to achieve consistent efficiency, requires a lot of time to learn and is prone to overfitting [18]. On the 
other hand, developing fuzzy rules and membership functions is tedious, and fuzzy outputs can be interpreted in 
a variety of ways that make analysis become complicated. In addition to being computationally expensive and 
complex, the key issue with SVM is the selection of the right function kernel [19]. Various kernel functions give 
different effects. This paper proposed another machine learning method, J48 decision tree to interpret DGA 
findings which offers a relatively quicker and less complex algorithm compared to SVM. Additionally, the 
structure of J48 decision tree is more comprehensible compared to ANN architecture. 


2. DECISION TREE 

Decision trees are one of the most effective methods in data mining for creating multiple covariates 
classification systems or for designing predictive algorithms for a target variable. This method is frequently 
used in numerous applications since it is user-friendly, straightforward, and stable even when missing values 
are present. In the decision tree method, a population will be classified into branch-like segments that create 
an inverted tree with a root node, internal nodes, and leaf nodes as shown in Figure 1 [20]. 
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Leaf Node Leaf Node 


Se | 


Leaf Node 


Leaf Node 


Figure |. Decision tree model 


A root node, also called a decision node represents a decision that will allow all records to be 
subdivided into two or more mutually exclusive subsets. The output of those decisions which do not contain 
any further branches is known as leaf nodes. Each leaf node symbolizes the mark of the particular class. On 
the other hand, several possible decisions available in the tree structure which connected between the root 
node and leaf nodes are called internal nodes. 

Several decision tree algorithms like ID3, J48, CART, C5.0, SLIQ, SPRINT, random forest, and 
random tree have been developed for classification [21]. ID3, J48, and C5.0 algorithms implemented the top- 
down decision tree construction concept to obtain the output, while the CART algorithm is based on binary 
decision tree construction [22]. In this work, the J48 decision tree algorithm is chosen to classify the fault 
types of the transformer. 

J48 decision tree or C4.5 algorithm developed by Ross Quinlan is an expansion of ID3 algorithm 
which allowed the target value of new test data to be decided with respect to the different attribute values of 
training data [20]. It improves the ID3 algorithm by dealing with both continuous and discrete attributes, 
missing values and pruning trees after construction. The J48 algorithm exploited a top-down greedy search 
through the given sets to test each attribute at every tree node [23]. 
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As a supervised learning algorithm, a set of example data that consists of relationships between 
input objects and the desired output value is required to develop the J48 decision tree model [24]. This 
dataset will be used for training purposes. J48 decision tree induction methods begin with a root node 
representing the entire data set and separating the data into smaller subsets recursively by checking at each 
node for a given attribute. A root node is picked based on the highest gain values obtained among all 
attributes, while the splitting process is executed by considering the characteristics that are related to the 
degree of ‘purity’ in the dataset. This process is repeated until the subsets are “pure”, whereas, all instances 
in the subset fall within the same class, at which time the tree growing is terminated. In the cases, where the 
stopping rules do not work well, then, the pruning process is conducted to decrease the classification errors 
[25]. Pruning is a process of removing the unnecessary nodes from a tree in order to get the optimal decision 
tree and also prevent the overfitting or underfitting rules been developed. The process of decision tree 
development using J48 algorithm is summarized in Figure 2. 
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Figure 2. Flowchart of J48 (C4.5) algorithm 
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3. J48 DECISION TREE MODEL 

To establish a DGA interpretation model, a total of 500 data collected from various operating 
transformers under different operating, age and health conditions were used to train by the J48 decision tree 
algorithm. Instead of using all fault gases, the proposed model only concentrated on the three main gases; 
CHa, C2H2, and C2Hy as inputs attribute to interpret the transformer condition. These three gases are the same 
gases used in the DTM method. On the other hand, the output variable of the model that represents the 
transformer health conditions are classified into eight (8) categories as in Table 1. 

The process of developing a DGA interpretation model is summarized in Figure 3. The process 
began by training a set of 500 transformers data with the known fault condition using J48 algorithm to obtain 
the decision tree model. These 500 datasets consist of all fault categories stated in Table 1. This training was 
performed using cross-validation with 10 folds procedure to increase the effectiveness of the proposed 
interpretation model. 


Table 1. Proposed transformer fault classifications 
Fault id Fault description 
NF No fault 
PD Partial discharge 
D1 Low energy discharge 


D2 Arching 
DT Electrical-thermal 
Tl Low thermal fault 


T2 Medium thermal fault 
T3 High thermal fault 
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J48 Decision Tree Algorithm 


Testing the Model Performance 
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80% Correct 


Transformer Fault Interpretation Model 
Ready 


Figure 3. Process of developing the proposed interpretation model 


The result of decision tree model generated with the J48 algorithm shows that the main attribute for 
the transformer faults among these three gases is C2H2, hence being selected as the root node in the model. 
The tree size of the developed model is 139 and consists of 70 leaves (leaf nodes). In the meantime, the 
developed decision tree model achieved 83.8% of correctness classified the transformer fault types whereas 
about 419 out of 500 datasets with 0.0619 and 0.1759 of mean absolute error (MAE) and root mean squared 
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error (RMSE) respectively. Detail prediction results are tabulated in the matrix as in Table 2. It can be seen 
that the proposed interpretation model developed was successfully classified each transformer fault type at an 
average of more than 80% except for D1, which a bit lower. 


Table 2. Training results of J48 decision tree model 
J48 decision tree prediction 
DI D2 a Dit SINE PDs lle 2a eeaiss 


DI 32 #16 O 1 1 1 0 0 
D2 3 138 0 4 1 1 1 2 
DT 0 1 8 1 0 0 0 0 

3 NF 0 0 1 399 0 2 0 0 

= PD 1 2 0 1 30.3 0 0 

< fii 3 0 0 3 3 93 1 7 
mil 1 0 0 0 3 HM 4 
1 i 2 2 0 0 7 1 66 


After the best possible decision tree model has been achieved, an additional 100 datasets of known 
transformer fault types are used to evaluate further the performance of the proposed model. The proposed 
interpretation model must succeed at least 80% accuracy in classifying the overall transformer fault types 
before its ready to be used. Otherwise, the model will be modified and the training process is repeated until it 
succeeds 80% of prediction accuracy. From 100 datasets, the proposed model is able to correctly classified 
81 of transformer faults as shown in Table 3, which equivalent to 81% of accuracy, hence surpassing the 
minimum requirement that has been agreed. 


Table 3. Validation results of the proposed j48 decision tree model 
J48 decision tree prediction 


DI’ D2? DI’ NF’ PD’ TI’ 12’ 73° 
Dl BS 2 0 0 1 0 Oo O 
mw 12 0 1 0 0 oO 0 
DT 0 1 8 1 1 0 0 1 

3 NF 1 0 0 10 1 0 oO O 

5 PD 0 0 0 0 12 #O oO 0 

< iim (0 0 0 0 0 13 1 0 
T2 0 0 0 0 0 1 8 3 
T3. OO 0 1 0 0 2 oO 9 


4. RESULTS AND DISCUSSION 

In this section, the performance of the proposed interpretation model is compared with the DTM (as 
shown in Figure 4), which recognized as the best interpretation technique by industries so far. Although the 
latest improvement of DTM method is available, DPM, however its only works as a complementary to 
existing DTM, and does not replace it [8]. To evaluate the performance of both methods, another set of 65 
transformers data with known fault conditions were used to examine the prediction accuracy. The confusion 
matrix is employed to analyze the performance of both methods in classifying the fault types. The confusion 
matrix is a table that reports the number of True positive (TP), True negative (TN), False positive (FP), and 
False negative (FN) which permits the visualization of classification accuracy and the performance of the 
method. The following are definitions of those terms: 

i) TP: Cases in which correctly predicted Yes 
ii) TN: Cases in which correctly predicted No 
iii) FP: Cases in which predicted Yes, but actually is No 
iv) FN: Cases in which predicted No, but actually is Yes. 

The precision, recall, and F-measure are performed to examine the classification performance. The 
precision is to quantify the number of positive class predictions that actually belong to the positive class, 
while the recall will quantify the number of positive class predictions out of all positive examples in the 
dataset. On the other hand, F-measure provides a single score that balances both the concerns of precision 
and recall in one number. In the meantime, the accuracy of a classifier is referred to the probability of the 
method correctly predicting the actual fault of the transformer. The precision, recall, F-measure, and accuracy 
can be computed as: 
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precision = —-— ” 
TP 
recall = (Gow) » 
recision x recall 
F =2x|? oa 
measure precision + recall (3) 
TP+TN 
Accuracy = TP+FP4+TN+FN ° 
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mm— _ %CH, 


Figure 4. Duval triangle method diagram 


Table 4 and Table 5 show the confusion matrix obtained for DTM and J48 Decision tree model 
respectively for 65 datasets of the transformer. According to Table 4, the DTM was successfully diagnosed 
42 out of 65 cases, while the remaining cases were wrongly classified. On the other hand, the J48 model 
gives a better prediction with 53 out of 65 cases were correctly classified as shown in Table 5. Further 
analysis is shown in Table 6, whereas the precision, recall, F-Measure, and accuracy for each fault class are 
been analyzed. 


Table 4. DTM confusion matrix 
J48 decision tree prediction 


DI’ D2’? DI NF Pp’ TI’ 72’ 73° 
DI 8 0 0 0 0 0 2 0 
D2 3 8 3 0 0 0 0 1 
DT 0 0 Bi 0 0 0 1 0 

= NE 0 0 0 0 0 1 2 1 

B PD O 0 2 0 1 0 0 1 

< Tl oO 0 1 0 1 6 1 2 
1 id 0 0 0 0 0 6 0 
Ts id 0 0 0 0 0 1 1 


Based on Table 4, it is noticed that the DTM is precisely classified the actual T2 fault (correctly 
classified 6 out of 6). However, it also frequently misinterprets other faults as T2, hence reducing the recall 
and accuracy of DTM in classifying T2 fault. In contrast with T1 results, although the DTM only manage to 
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correctly classified 6 out of 11 cases, however there is only 1 case where DTM is wrongly predicted. 
Therefore, the accuracy of DTM in classifying T1 fault is higher than T2. From results, it also noticed that 
the most truthfully classified by DTM is T3 with F-measure and accuracy are 0.79 and 0.65 respectively. The 
overall accuracy for DTM in classifying the fault types is only 40%. 

On the other hand, the proposed J48 decision tree model has an average of 81% precisely classified 
fault types. The most precise class predicted by the J48 model is NF with 100% (4 out of 4) correct and 
followed by T3 with 92% (11 out of 12). Different from DTM, the J48 model generates more consistent 
interpretation results whereas the average recall achieved about 83%. The lowest recall is given by T2 
whereas it is wrongly classified two cases as T2, which suppose to be T1 and T3. In the meantime, the 
proposed J48 model shows better accuracy compared to DTM with 69%. 


Table 5. The proposed model confusion matrix 
J48 decision tree prediction 
1) 2 TN ee 2 a ll Se 


DI 8 1 0 0 1 0 0 0 
D2 2 #12 #20 1 0 0 0 0 
DT 0 0 2) 0 0 0 0 1 
NF 0 0 0 4 0 0 0 0 
sim 1 0 0 0 3 0 0 0 
Tl (OO 1 0 0 0 8 1 1 
T2 O 1 0 0 0 0 5 0 
T3) OO 0 0 0 0 0 1 


Table 6. Precision, recall, f-measure and accuracy results comparison between DTM and the proposed J48 
model 

Precision Recall F-measure Accuracy 

DIM ___J48 DTM J48 DTM J48 DTM J48 
D1 0.80 0.80 0.73 0.73 0.76 0.76 0.62 0.62 
D2 0.53 0.80 1.00 0.80 0.70 0.80 0.53 0.67 
DT 0.67 0.67 0.25 1.00 0.36 0.80 0.22 0.67 
NF 0.00 1.00 0.00 080 0.00 089 0.00 0.80 
PD 0.25 0.75 0.50 0.75 0.33 0.75 0.20 0.60 
Tl 0.55 0.73 0.86 1.00 0.67 0.84 0.50 0.73 
T2 1.00 0.83 0.46 0.71 0.63 0.77 0.46 0.63 
T3 0.92 0.92 0.69 0.85 0.79 0.88 0.65 0.79 

Average 0.59 0.81 0.56 0.83 0.53 0.81 0.40 0.69 


Fault types 


5. CONCLUSION 

This paper proposes a J48 decision tree model to interpret the transformer fault types based on the 
dissolved gas analysis data. The proposed model has been developed using a set of transformer historical data 
with the pre-known health condition. Three fault gases, CH4, C2H4, and C2H2 are selected as inputs to the 
model and interpreted the transformer into eight fault classifications. The performance of the proposed model 
is evaluated using another sixty-five datasets and compared with the Duval Triangle method. Although the 
proposed model shows superior performance to DTM, however its accuracy can be improved further by 
considering more DGA samples during the training phase. Besides that, adding other fault gases such as H2 
and C2H¢ also have the potential to enhance the model accuracy. However, by doing so, it may also increase 
the tree size and introduce overfitting issues if not considered carefully. 
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